Add agent_policy_id and policy_revision_idx to checkin requests #9931

michel-laterman · 2025-09-12T21:08:40Z

What does this PR do?

Add the agent_policy_id and policy_revision_idx attributes to checkin requests.
These attributes are sources from the action stored as a part of the state.
Add a feature flag to disable sending acks for policy change actions; behaviour for policy change acks has not been changed with this addition (they are always sent).

Why is it important?

The policy information in fleet-server and agent may go out of sync; this may occur in cases where a VM restores from a snapshot.

Checklist

I have read and understood the pull request guidelines of this project.
My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool
I have added an integration test or an E2E test

Disruptive User Impact

N/A

Related issues

Closes Fleet check-in should send policy_id and revision #6446

michel-laterman · 2025-09-12T21:09:51Z

Adding integration/e2e tests requires the fleet-server to be implemented first: elastic/fleet-server#5501
I'll keep this as a draft until the above PR is merged.

I'm not changing the default behaviour for the agent with regards to acks.

Add the agent_policy_id and policy_revision_idx attributes to checkin requests.

internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go

elastic-sonarqube · 2025-09-22T18:59:28Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
85.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

elasticmachine · 2025-09-22T19:36:07Z

💚 Build Succeeded

Buildkite Build
Commit: 67a3b80

History

💔 Build #27277 failed 85cfaef
💔 Build #27271 failed b585e91
💔 Build #27268 failed fe7ba4e
💔 Build #27133 failed 9535385
💔 Build #26761 failed cfb5df4

cc @michel-laterman

elasticmachine · 2025-09-22T19:44:14Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

pkg/features/features.go

blakerouse

Overall this looks good, but I have 2 questions.

internal/pkg/agent/application/gateway/fleet/fleet_gateway.go

pkg/features/features.go

blakerouse

Looks good. Thanks for the clarification.

michel-laterman · 2025-09-23T18:02:56Z

elastic-agent/internal/pkg/agent/application/gateway/fleet/fleet_gateway.go

Lines 161 to 176 in ab68480

    
           case <-f.scheduler.WaitTick(): 
        
           	f.log.Debug("FleetGateway calling Checkin API") 
        
           	// Execute the checkin call and for any errors returned by the fleet-server API 
        
           	// the function will retry to communicate with fleet-server with an exponential delay and some 
        
           	// jitter to help better distribute the load from a fleet of agents. 
        
           	resp, err := f.doExecute(ctx, requestBackoff) 
        
           	if err != nil { 
        
           		continue 
        
           	} 
        
           	actions := make([]fleetapi.Action, len(resp.Actions)) 
        
           	copy(actions, resp.Actions) 
        
           	if len(actions) > 0 { 
        
           		f.actionCh <- actions 
        
           	}

After fleet checkin, the agent sends all actions through a channel to the dispatcher. They are executed concurrently with the checkin loop; the ticker by default has a 1s duration with up to 500ms jitter.
There is no guarantee that the POLICY_CHANGE action is executed before the next checkin.
cc @blakerouse

blakerouse · 2025-09-23T20:39:44Z

@michel-laterman Thanks for the clarification from the call today. I don't think this should be an issue with this PR, but we might want to make just the policy change blocking, at least until we know its either applied or not applied. That would really reduce the load on Fleet Server, could be a scale improvement really.

We could do something like:

ctx, cancel := context.WithTimeout(ctx, 5 * time.Second)
defer cancel()
waitForPolicyApply := f.handleActions(actions)
select {
case <-waitForPolicyApply:
case <-ctx.Done():
}

michel-laterman · 2025-09-24T15:20:22Z

Created #10130 to track

blakerouse · 2025-09-24T20:00:04Z

@michel-laterman Thanks!

* upstream: (505 commits) Update journald tests now that Filebeat supports watching folders (#10131) [deploy/kubernetes]: add info about hostPID for Universal Profiling (#10173) Fall back to process runtime if otel runtime is unsupported (#10087) Conditionall check for ms_tls13kdf build tag (#10160) [docs][edot] add entry for profiles (#10163) edot/docs: add support for profiles (#10146) Add Logstash exporter (#10137) Add back publish to serverless. (#10159) Improve Integration test documentation (#10155) Fix multiarch service image push from main to serverless (#10129) Forward migrate action to endpoint (#9801) Comment out check for ms_tls13kdf tag for FIPS-capable binaries (#10148) [otel] add receivers: apache, iis, mysql, postgresql, sqlserver v0.135.0 (#9344) Add k8sevents receiver in kube-stack (#10086) feat: emit system resource metrics for EDOT subprocess (#10003) [AutoOps] Configure OTel Exporter to Send Maximum-sized Batches (#10126) keep enrollment token when replacing data with signed (#10115) Revert "Publish `elastic-agent-service` container directly to serverless from main (#9583)" (#10127) Add agent_policy_id and policy_revision_idx to checkin requests (#9931) remove resource/k8s processor and use k8sattributes processor for service attributes (#10108) ...

michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-skip labels Sep 12, 2025

mergify bot assigned michel-laterman Sep 12, 2025

Add agent_policy_id and policy_revision_idx to checkin requests

f3c044b

Add the agent_policy_id and policy_revision_idx attributes to checkin requests.

michel-laterman force-pushed the feat/checkin-policy-details branch 2 times, most recently from 0721e6f to cfb5df4 Compare September 12, 2025 22:25

michel-laterman commented Sep 12, 2025

View reviewed changes

internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go Show resolved Hide resolved

Remove policy change action acks

9535385

michel-laterman force-pushed the feat/checkin-policy-details branch from cfb5df4 to 9535385 Compare September 18, 2025 20:24

michel-laterman added 3 commits September 22, 2025 09:00

Add ForcePolicyChangeAcks feature flag

fe7ba4e

Change FF to explicitly disable acks

b585e91

Fix feature flag

67a3b80

michel-laterman force-pushed the feat/checkin-policy-details branch from 85cfaef to 67a3b80 Compare September 22, 2025 17:24

michel-laterman marked this pull request as ready for review September 22, 2025 19:44

michel-laterman requested a review from a team as a code owner September 22, 2025 19:44

michel-laterman requested review from ycombinator and straistaru September 22, 2025 19:44

michel-laterman commented Sep 22, 2025

View reviewed changes

pkg/features/features.go Show resolved Hide resolved

blakerouse reviewed Sep 22, 2025

View reviewed changes

internal/pkg/agent/application/gateway/fleet/fleet_gateway.go Show resolved Hide resolved

pkg/features/features.go Show resolved Hide resolved

blakerouse approved these changes Sep 23, 2025

View reviewed changes

michel-laterman mentioned this pull request Sep 24, 2025

[Fleet] Expose agent.features.disable_policy_change_acks in fleet settings elastic/kibana#236327

Open

michel-laterman merged commit f2c4cfa into elastic:main Sep 24, 2025
23 checks passed

michel-laterman deleted the feat/checkin-policy-details branch September 24, 2025 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add agent_policy_id and policy_revision_idx to checkin requests #9931

Add agent_policy_id and policy_revision_idx to checkin requests #9931

Uh oh!

michel-laterman commented Sep 12, 2025 •

edited

Loading

Uh oh!

michel-laterman commented Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

elastic-sonarqube bot commented Sep 22, 2025

Uh oh!

elasticmachine commented Sep 22, 2025

Uh oh!

elasticmachine commented Sep 22, 2025

Uh oh!

Uh oh!

blakerouse left a comment

Uh oh!

Uh oh!

Uh oh!

blakerouse left a comment

Uh oh!

michel-laterman commented Sep 23, 2025

Uh oh!

blakerouse commented Sep 23, 2025

Uh oh!

Uh oh!

michel-laterman commented Sep 24, 2025

Uh oh!

blakerouse commented Sep 24, 2025

Uh oh!

Uh oh!

Add agent_policy_id and policy_revision_idx to checkin requests #9931

Add agent_policy_id and policy_revision_idx to checkin requests #9931

Uh oh!

Conversation

michel-laterman commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why is it important?

Checklist

Disruptive User Impact

Related issues

Uh oh!

michel-laterman commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

elastic-sonarqube bot commented Sep 22, 2025

Quality Gate passed

Uh oh!

elasticmachine commented Sep 22, 2025

💚 Build Succeeded

History

Uh oh!

elasticmachine commented Sep 22, 2025

Uh oh!

Uh oh!

blakerouse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

blakerouse left a comment

Choose a reason for hiding this comment

Uh oh!

michel-laterman commented Sep 23, 2025

Uh oh!

blakerouse commented Sep 23, 2025

Uh oh!

Uh oh!

michel-laterman commented Sep 24, 2025

Uh oh!

blakerouse commented Sep 24, 2025

Uh oh!

Uh oh!

michel-laterman commented Sep 12, 2025 •

edited

Loading

michel-laterman commented Sep 12, 2025 •

edited

Loading